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Abstract 

Background: The detection of change in magnitude of directional coupling 
between two non-linear time series is a common subject of interest in the 
biomedical domain, including studies involving the respiratory chemoreflex system. 
Although transfer entropy is a useful tool in this avenue, no study to date has 
investigated how different transfer entropy estimation methods perform in typical 
biomedical applications featuring small sample size and presence of outliers. 

Methods: With respect to detection of increased coupling strength, we compared 
three transfer entropy estimation techniques using both simulated time series and 
respiratory recordings from lambs. The following estimation methods were analyzed: 
fixed-binning with ranking, kernel density estimation (KDE), and the Darbellay-Vajda 
(D-V) adaptive partitioning algorithm extended to three dimensions. In the simulated 
experiment, sample size was varied from 50 to 200, while coupling strength was 
increased. In order to introduce outliers, the heavy-tailed Laplace distribution was 
utilized. In the lamb experiment, the objective was to detect increased respiratory- 
related chemosensitivity to 0 2 and C0 2 induced by a drug, domperidone. Specifically, 
the separate influence of end-tidal P0 2 and PC0 2 on minute ventilation (y £ ) before 
and after administration of domperidone was analyzed. 

Results: In the simulation, KDE detected increased coupling strength at the lowest 

SNR among the three methods. In the lamb experiment, D-V partitioning resulted in 

the statistically strongest increase in transfer entropy post-domperidone for 

P0 2 V E - ' n addition, D-V partitioning was the only method that could detect an 

increase in transfer entropy for PC0 2 -> V& i n agreement with experimental 

findings. 

Conclusions: Transfer entropy is capable of detecting directional coupling changes 
in non-linear biomedical time series analysis featuring a small number of 
observations and presence of outliers. The results of this study suggest that fixed- 
binning, even with ranking, is too primitive, and although there is no clear winner 
between KDE and D-V partitioning, the reader should note that KDE requires more 
computational time and extensive parameter selection than D-V partitioning. We 
hope this study provides a guideline for selection of an appropriate transfer entropy 
estimation method. 
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Introduction 

In multi-variable time series analysis, a common subject of interest is the coupling 
among the variables. One promising measure of the coupling strength between two 
time series is transfer entropy [1,2], which quantifies the amount of information trans- 
fer from one variable to the other. Importantly, transfer entropy is non-parametric and 
can capture non-linear coupling effects. This property can be useful in analyzing com- 
plex systems where interactions among sub-systems are expected to be non-linear and 
where minimal a priori knowledge is available. Furthermore, transfer entropy is an 
asymmetric measure that conveys directional information. Unlike mutual information, 
which can only quantify the amount of shared information between two variables, 
transfer entropy can elucidate directional relationships between variables. In fact, it has 
been shown that transfer entropy is a non-linear extension of Granger causality [3]. 

Given all these advantages, transfer entropy has been gaining popularity as a power- 
ful analytic tool for characterizing complex physiologic networks. For instance, this 
new method has been suggested [4] and successfully applied to electroencephalograms 
and magnetoencephalograms to help elucidate complex neural phenomena such as 
consciousness under anesthesia [5], sensory functional integration [6], auditory neural 
assembly [7], attention related processing [8], and sensor-motor connectivity [9]. In the 
area of cardiovascular physiology, Katura and colleagues [10] have utilized the techni- 
que of transfer entropy to elucidate (or reject) causal relationships that explain sources 
of variability in the regulation of cerebral hemodynamics. In addition, a number of stu- 
dies have employed variations of transfer entropy, such as cross-conditional entropy 
and conditional mutual information, to investigate relationships between cardiovascular 
variables as indicator of baroreflex function [11-14]. Conditional mutual information 
has been shown to be equivalent to transfer entropy [15]. 

In practice, one major challenge with transfer entropy estimation using biomedical 
signals is the estimation of Probability Density Functions (PDFs) from finite data with 
outliers. This estimation challenge is well discussed in [2,16]. Small sample size can 
stem not only from experimental and/or technical constraints but also from the need 
for stationarity; a non-stationary time series may have to be broken into quasi-station- 
ary segments before analysis. This sample size issue can be a major hindrance for bio- 
medical studies where data collection is often costly. Also, outliers frequently originate 
from either noise sources or underlying physiology. 

Therefore, our objective in this study was to investigate several transfer entropy esti- 
mation methodologies in the context of biomedical signals with small sample size and 
with outliers. This study can be viewed as an extension of the study on mutual infor- 
mation estimation (two dimensions) by Khan and colleagues [17] to transfer entropy 
estimation (three dimensions). Furthermore, unlike most other transfer entropy appli- 
cations, we focused on the ability to detect increases in coupling strength, rather than 
on detection of the existence of a significant coupling or on identification of time lags 
at which significant couplings exist. This is a common research topic of interest when 
one wishes to study how a given coupling varies in magnitude under different condi- 
tions while the existence of the coupling is known a priori (our application to respira- 
tory physiology in this study is one example). To the best of our knowledge, no study 
has attempted to investigate the sensitivity of various transfer entropy estimation meth- 
ods to changes in coupling strength in the biomedical domain. 
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Specifically, we studied three non-parametric techniques in this endeavor: fixed-bins, 
Kernel Density Estimation (KDE), and adaptive partitioning. The fixed-bin approach, in 
which a given time series is quantized by dividing the dynamic range into equally- 
spaced bins, is the simplest method that boasts the least computational complexity at 
the cost of inefficient bin allocation [16]. KDE has been shown to perform well in 
mutual information estimation using short time series [17] and hence was included in 
our study. For adaptive partitioning, we introduce a novel method that applies the Dar- 
bellay-Vajda (D-V) partitioning algorithm [18] to transfer entropy estimation. The D-V 
partitioning algorithm is equivalent to the adaptive partitioning introduced by Fraser 
and Swinney [2,19] and has been shown to be effective in mutual information estima- 
tion [20]. 

The comparison among the three transfer entropy estimation techniques was accom- 
plished via both simulated and real-life signals. In both experiments, we analyzed the 
sensitivity of the techniques to increased non-linear coupling strength, under small 
sample size and presence of outliers. The purpose of the simulated experiment was to 
perform a rigorous comparison in a controlled setting. In order to address any cou- 
pling characteristics that were not represented by the simulated data, we also applied 
the transfer entropy estimation methods to measurements of spontaneous breathing 
respiratory flow, end-tidal carbon dioxide pressure (PC0 2 ) and end tidal oxygen pres- 
sure (P0 2 ), collected from lambs [21]. The objective of the lamb experiment was to 
detect changes in chemosensitivity, which is of major clinical interest. The method of 
transfer entropy provides a natural framework for quantification of directional influ- 
ence that breath-by-breath changes in blood gases have on ventilation (via peripheral 
chemoreceptors located in the aortic arch and carotid bodies), while taking into 
account the delayed nature of the response (due to lung-to-chemoreceptors transport 
time, typically on the order of 2-4 breaths) and possible non-linearities [22]. Ventila- 
tory control instabilities are important in a variety of pathological conditions, including 
Cheyne-Stokes breathing in congestive heart failure [23,24] and obstructive sleep apnea 
[25-27]. Although the specific mechanisms underlying each condition may vary, 
increases in the ventilatory sensitivity to hypoxia/hypercapnia (controller gain) have 
been shown to play a critical role in the pathogenesis of unstable breathing. An ele- 
vated hypoxic ventilatory response at high altitude is associated with the presence of 
unstable breathing [28]. Moreover, heightened dynamic hypercapnic responses deline- 
ate heart failure patients with periodic breathing (in the form of Cheyne-Stokes 
respiration) from those patients that exhibit stable breathing [29-31]. Additionally, in 
heart failure patients, elevated hypoxic and/or hypercapnic sensitivity [32] and the con- 
sequent presence of Cheyne-Stokes respiration itself [33] are important predictors of 
mortality. 

Methods 

Transfer Entropy Construction 

Given two concurrently sampled time series X = {x\, x 2 , ■ ■ •, x N } and Y = \y x , y 2 , ■ ■ ., 
y N }, the transfer entropy from X to Y, termed T x ^y> can be derived from conditional 
entropies as follows: 

T x ^y = H[y i \yf_ t )-H{y i \y® t , X W x ) (1) 
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„ / m m x P[Yi\Yi-t' X i-r) 

where i indicates a given point in time, r and t are the time lags in X and Y, respec- 
tively, and k and / are the block lengths of past values in X and Y, respectively (see 
Appendix I for a complete derivation). The past values on which the conditional prob- 
abilities in (2) are conditioned are x f} r = {Xj_ T _fe +1( Xi_ T _fe +2 , *i- T } and 
y-y = {y,-_ t _/ + i, yi- t -i+2, ■ ■ ■ , Yi-t}- ^ n words, (1) implies that transfer entropy measures 
the reduction in uncertainty in y t given an< i y-_, m comparison with given only 
. Note that cannot be negative because H(yi\yf} t ) > H(y^ |y£^ c , x-^ ) in (1); 

conditioning on another variable, cannot increase the uncertainty in j,. (Note that 
if base 2 logarithm is used, the unit of transfer entropy is bits, which has the interpre- 
tation of the amount of reduction in the average length of the optimal code needed to 
encode the target variable.) 

Equation (2) is the most general definition of transfer entropy and provides maxi- 
mum flexibility. However, the usual parameter choices in practice are k = 1 and 1=1 
for computational reasons when sample size is small. In addition, here we set t = 1 
under the assumption that the maximum auto-transfer of information occurs from the 
data point immediately before the target value in Y. For example, Francis et al. only 
considered lags of one breath in ventilation and C0 2 for the same reason [30]. These 
choices of k = I = t = 1 are especially appropriate in biomedical experiments where 
time series length is usually short and the absolute values of auto-correlation functions 
tend to decrease monotonically as time lag increases. Also, the original definition in 
[1] implicitly assumed t = 1 by using a Markov chain; our choice can be seen as a 
first-order Markov process. Hence, we restrict the focus of this paper to these usual 
parameter choices by simplifying (2) to 

t I \ ~t 11 P(Yi\Yi-l>Xi-r) 

Tx.yM- E POW^^log (3) 

yayi—ii^i—x 

\£ x pfyi-i'V-xMYvYi-iY (4) 

Note that (3) and (4) explicitly show transfer entropy as a function of r. 

In practice, the foremost challenge in transfer entropy computation is estimating the 
joint probabilities in (4). The ensuing section describes the three different PDF estima- 
tion methodologies to be investigated in this study. 

Probability Density Function Estimation Methodologies 
Fixed Bins 

The simplest estimation approach to obtain the PDFs in (4) is to allocate data points 
to fixed, equally-spaced bins. In order to enhance robustness against outliers and 
sparse regions in the underlying distribution, we combined fixed binning with ordinal 
sampling (ranking). In ordinal sampling, the time series values in X and Y are substi- 
tuted with their ranks in sorted X and Y , similar to most non-parametric statistical 
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tests. The ranks are integers ranging from 1 (smallest value) to N (largest value). Let 
the transformed time series of X and Y be U = {u lt u 2 , ■ • •, u N } and V = {v x , v 2 , . . ., 
v N }, respectively (U and V contain integers that represent ranks). 
Then, (4) can be rewritten in terms of U and V as follows: 

Tu^v{t)= }^ P fa u -01og-7 tJ- 1 {■ (5) 

„.,„~_ t p{vt-i,Ui-^yp{vi,Vi-i) 

Using the same number of bins, Q, in each dimension for simplicity, the fixed-bin 
technique estimates (5) as follows: 

Q 

t ( \ ~ \ - , m a ,b,cm b 

0=1,0=1,0=1 

where a, and c index bins along v„ Vj.1, and respectively, and P is the total 
number of triplets of v„ Vn, and «i. r . Furthermore, m ai b, c , m ai b, and w^ jC represent the 
number of data points in the intersection of the one-dimensional bins denoted by the 
subscript, whereas mf, is the number of data points in the bth bin along the v,_i 
dimension. 

Kernel Density Estimation 

In KDE, the PDF is estimated by summing individual distributions centered at each 
data point. The shape of the individual distributions is dictated by a chosen kernel, the 
magnitude of which decreases as the distance from the center of the distribution 
increases. Since distance is a crucial measure in KDE, ordinal sampling is not 
employed. In the three-dimensional space of y it y^i, and x^„ the joint probability at an 
arbitrary point, {fi, y;_i,X;_ T ), can be estimated by normalizing the following: 

i i K / yt-nA -r <-i,j\ ~ 



where / indexes the P data points and /z(.) is the bandwidth for the dimension speci- 
fied in the subscript. We utilized a scaled version of the rule of thumb specified in [34] 
to select the bandwidth: 

h H = l.ObaoP- 1 ! 5 , (8) 

where a is a multiplier for scaling and a is the sample standard deviation in the 
given dimension. In terms of K, several possible kernels exist but we employed the 
widely used Gaussian kernel in this study, which is defined as follows: 

K{u) = — L<r° 5 " 2 (9) 
The rest of the probabilities in (4) can be computed by marginalizing 

Pifi, ft-l, Xi-r)- 

Adaptive Partitioning using the Darbellay-Vajda Algorithm 

We introduce an extension of the D-V algorithm to three-dimensional space for trans- 
fer entropy estimation. After ordinal sampling, the D-V algorithm recursively partitions 
the three-dimensional space defined by v it v,-_i, and W;_ r into cubes of varying sizes. 
Initially, the entire space is sliced into 8 equal cubes, where the boundaries are at the 
mid-points in the three dimensions. Using the 8 cubes, the following ^ 2 statistic is 
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computed to test the null hypothesis that data points are evenly distributed across the 
8 cubes: 

8 

h 2 = J2( M > - ^ 2 > (10) 

i=l 

where M lt M 2 , ■ ■ ., M s are the numbers of data points contained in the 8 cubes and 
fi M is the total number of data points divided by 8 (number of cubes). If s x 2 > X$5% (?) 
(at a 5% significance level and 7 degrees of freedom), then the null hypothesis is 
rejected and each of the 8 cubes is further partitioned into 8 smaller cubes in the same 
manner, and the recursion continues. If the null hypothesis is not rejected, the parti- 
tions in the current iteration are discarded and the current 8 cubes under scrutiny as a 
whole are taken as one partition. Cubes that contain no data point do not contribute 
to the transfer entropy estimation. 

Appendix II shows a two-dimensional analogy of a partitioned space. It was gener- 
ated based on 1,000 data points randomly sampled from a bivariate Gaussian distribu- 
tion with covariance O xy = 0.9 and unit variance, ct a 2 = 1 and ffy = 1. Note that many 
squares with smaller sizes are allocated to densely populated parts of the space, 
whereas fewer partitions are created in sparse areas. 

The result of the partitioning process is a finite number of cubes, L, each of which 
contains nonzero data points. By approximating each probability in (5) by counting, 
T u ^, v (t) can now be estimated using the L partitions as follows: 

WO* Eg log n jffiL . (ID 

where P is the total number of data triplets (v„ v^i, and u^ T ), is the number of 
data points in the /cth partition, and n fe ' _1 , n^""' _I , and y\"" ,_1 are the numbers of data 
points in the entire data that are greater than or equal to lower bounds and less than 
upper bounds of the /cth partition with respect to the dimensions in the superscript. 
For example, if the kth partition's lower and upper limits in v,-^ are 5 and 10, then n^ 1 
is the total number of data points such that 5 < < 10, regardless of their v, and M;_ r 
coordinates. 

Our transfer entropy estimation method with D-V partitioning has been implemen- 
ted in MATLAB, and we made the code and an example data set (from the simulated 
experiment described in the next section) publicly available at PhysioNet [35] (http:// 
physionet.org). 

Experiments 
Simulation 

The source time series, X = {x lt x 2 , ■ ■ ., x N+2 } and the corresponding sink time series, Y 
= iyi> J2> • • •> Jn+2Si were generated as follows: 



Lee et al. BioMedical Engineering Online 2012, 11:19 
http://www.biomedical-engineering-online.eom/content/1 1/1/19 



Page 7 of 1 7 



where the signal {s Xti and s y J) and noise components (v x i and v yii ) are 
s*,i~ AA(10,1) 

Vx,j ~ £(0, 

Vyj ~ £(0, by), 

where ~ denotes sampling from a distribution, a is a coupling constant, and J\f and 
£ represent the Gaussian (mean, variance) and Laplace (mean, scale) distributions, 
respectively. The Laplace distribution is an example of a heavy-tailed statistic and thus 
can better model the type of noise often encountered in physiological signals compared 
with Gaussian noise (e.g., occasional sighs and apneas in a ventilation time series or 
muscle artifacts in electrocardiography (ECG) recordings [36]). Equation (12) imposed 
a nonlinear relationship between X and Y (via the squaring) at a time lag of two. Sam- 
ple size, N, was varied from 50 to 200 at a step of 50, representing the focus of this 
experiment on small sample size. Note that the lengths of X and Y were N + 2 due to 
the lag of 2; x N+i , x N+2 , J\, and y 2 were discarded during transfer entropy estimation. 

The coupling strength between X and Y was varied by increasing the signal-to-noise 
ratio (SNR) in both time series from 10 to 20 dB in steps of 1 dB. This increase in 
SNR was induced by increasing the coupling constant a. In order to maintain the same 
SNR in both X and Y , the noise power in X was gradually reduced by decreasing b x 
while holding the signal power constant, whereas the signal power in Y was gradually 
increased by increasing a accordingly while holding the noise power (i.e., b y ) constant. 

As the first step, we analyzed transfer entropy vs. time lag (r, from 0 to 5 in steps of 
1) with SNR = 20 dB and N = 200 using all three estimation methods. This analysis 
served not only as a comparison among the methods but also as a parameter selection 
step for the fixed-bin (Q was varied from 4 to 10 in steps of 2) and KDE (a was varied 
from 0.5 to 1.5 in steps of 0.25) methods. 

With the selected values for the parameters Q and a, X and Y were randomly gener- 
ated 100 times at each SNR level and the corresponding transfer entropies were com- 
puted using all three methods. Subsequently, one-sided Wilcoxon rank sum tests were 
conducted to judge whether there was a significant increase in transfer entropy at each 
step of increase in SNR with p < 0.05 considered to be statistically significant. 
An Application to Respiratory Physiology 

Using the three transfer entropy estimation methods, we quantified the impact of the 
administration of domperidone, a dopamine D 2 -receptor antagonist that increases car- 
otid body sensitivity to 0 2 and C0 2 [21]. We performed our analysis on recordings 
from 14 newborn lambs obtained at Monash University, Australia [21]. All surgical 
and experimental procedures conformed to the guidelines of the National Health and 
Medical Research Council of Australia and had the approval of the Standing Commit- 
tee in Ethics in Animal Experimentation of Monash University. 

Measurements of spontaneous breathing respiratory flow, end-tidal PCO2 and end- 
tidal P0 2 were obtained for a period of approximately 10 minutes before and following 
the intravenous administration of domperidone. Minute ventilation (Ve) was calculated 
for each breath as V T IT tot , where V T is tidal volume and T tot is breath duration (see 
Figure 1 for example waveforms and derived time series). Breath-to-breath time series 
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Figure 1 Lamb experiment: example waveforms and derived time series Panels A, B, and C show 
examples of tidal volume, PC0 2 and P0 2 waveforms, respectively. The onset and offset of each inspiration 
are marked with an open circle in panel A. The y E time series (in panel D) was derived from the tidal 
volume waveform as described in the text. The end-tidal PC0 2 {ET PC0 2 in panel E) and end-tidal P0 2 (ET 
P0 2 in panel F) time series were extracted from the measurements of PC0 2 and P0 2 in the end-expiratory 
air, respectively, as indicated by the open circles in panels B and C. 



data of V E , end-tidal PC0 2 and end-tidal P0 2 were further band-pass filtered to remove 
any oscillations slower than (i.e., more breaths/cycle than) 20 breaths/cycle or faster 
than (i.e., fewer breaths/cycle than) 5 breaths/cycle (using a 7th-order Butterworth digi- 
tal filter with band-pass frequency of 0.05-0.2 cycles/breath). The frequencies included 
in the resulting band-passed time series are consistent with the range of oscillations 
often observed during periodic breathing [21,25]. All extracted time series included at 
least ~ 200 data points. In a few cases where longer observations were available, the time 
series were divided into non-overlapping windows of size ~ 200 data points and the win- 
dow with the largest variance was selected for further analysis. Our rationale behind this 
choice of data segment selection was to characterize the system at its most variable (or 
unstable) state during spontaneous breathing (see [37] for more information). 

The three transfer entropy estimation methods were applied to the V E , P0 2 , and 
PC0 2 time series from each lamb. Specifically, the following two directional informa- 
tion flows were investigated before and after the administration of domperidone: 
P0 2 —> Ve an d PCO2 —> Ve- Unlike the simulation experiment, the time lag r at which 
a significant coupling exists was unknown a priori. Hence, r was varied between one 
and five breaths and the results for the time lag at which the value of transfer entropy 
was the largest and statistically significant were reported. Significant information flows 
were determined by Monte Carlo surrogates, i.e., temporally shuffled time series. In 
this experiment, transfer entropies were computed for 100 surrogates of the source 
time series {P0 2 or PC0 2 ), and the transfer entropy from the original source time ser- 
ies was deemed to be significant if it was greater than the 95th percentile of the surro- 
gate results (as described in [38]). If no r value resulted in a significant information 
flow, that particular time series pair was removed from further analysis. There was dif- 
ficulty in selecting optimal Q and a values for the fixed-bin and KDE methods, respec- 
tively, due to the lack of a priori knowledge regarding coupling time lag. Therefore, 
arbitrary, but reasonable, choices of Q = 5 and a = 1 were made. Finally, increases in 
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coupling strength caused by domperidone were tested using the Wilcoxon signed rank 
sum test. 

Results 

Simulated Experiment 

Figure 2 shows how the three transfer entropy estimation methods performed when r 
was varied from 0 to 5 with N = 200 and SNR = 20 dB in the simulated experiment. 
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Q=6 
Q=8 
Q=10 



0 1 2 3 4 5 



KDE: various bandwidths 




-a=0.5 
a=0.75 

■ a=1 
a=1.25 
a=1.5 



All three methods with selected parameters 




Fixed-bin, Q=4 
KDE, of1.5 
D-V Partitioning 



Figure 2 Simulated experiment: parameter selection for the fixed-bin (panel A) and KDE (panel B) 
methods, and a comparison among the three transfer entropy estimation methods (panel C). Each 
panel shows a plot of median transfer entropy vs. time lag (r), with SNR = 20 dB, N = 200, and 100 trials. 
Significant information transfer occurs only at r = 2. The error bars represent interquartile range. Panels A 
and B show that sub-optimal parameters led to inflated transfer entropies at time lags of no coupling. 
Panel C compares the three methods after selecting Q = 4 and a = 1.5. 
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Panels A and B illustrate transfer entropy estimation for different values of Q and a for 
the fixed-bin and KDE methods, respectively. Since one expects to see near zero transfer 
entropies at r values other than 2, Q = 4 and a = 1.5 were selected for the rest of the 
experiment. Panel C of Figure 2 compares the three methods using the selected para- 
meter values. All three methods allowed one to visually distinguish t = 1 from other 
time lags. KDE was better at estimating low transfer entropies at r * 2 at the cost of a 
reduced transfer entropy at r = 2. Fixed-binning and D-V partitioning performed com- 
parably, although D-V partitioning yielded lower transfer entropies at r * 2. 

The main results of the simulated experiment are shown in Figure 3 which illustrates 
the estimated transfer entropies from the 100 trials as a function of SNR. Following 
from Figure 2, Q = 4 and a = 1.5 for the fixed-bin and KDE methods, respectively. 
Overall, the fixed-bin method estimated higher transfer entropies at all lags than the 
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Figure 3 Simulated experiment: transfer entropy as a function of SNR. A comparison among the 
fixed-bin with ranking, KDE, and D-V partitioning methods. Based on 100 trials, each line shows median 
transfer entropy as a function of SNR, with four different sample sizes in separate panels. The error bars 
represent interquartile ranges. SNR represents both coupling strength and the contribution from outliers, 
and transfer entropy is expected to increase with increasing SNR. The corresponding one-sided rank sum 
test results are shown in Table 1. 
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Table 1 Simulated experiment: minimum SNR associated with sustained detection of 
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The minimum SNR at and beyond which statistically significant increases in transfer entropy are sustained according to 
the one-sided rank sum test. The two compared SNR levels are tabulated. See Figure 3 for visualization. 



other two methods, which was improved as N increased. KDE showed the smallest 
dynamic range in transfer entropy as SNR increased but was also associated with the 
smallest estimation variance at any given SNR. In fact, estimation variance with KDE 
was similar across different sample sizes, whereas the other two methods resulted in 
improved estimation variance as N increased. 

Table 1 tabulates the one-sided rank sum test results of Figure 3 for detection of sta- 
tistically significant increases in coupling strength. The two compared SNR levels are 
shown and statistically significant increases in transfer entropy were found to be sus- 
tained at and beyond the indicated SNR levels. In Table 1 all methods show the trend 
of detecting increased coupling at a lower SNR as N increases. Overall, KDE outper- 
formed the other two methods in sensitivity to increasing coupling strength; for N > 
100, KDE was able to distinguish 11 dB {a = 1.059) and 12 dB (a = 1.122). Although 
D-V partitioning was able to detect increases in transfer entropy at a lower SNR than 
the fixed-bin method when N = 50, the fixed-bin method slightly outperformed D-V 
partitioning at other N values. 

Respiratory Physiology Experiment 

Figure 4 illustrates how the estimated transfer entropy values changed for PCO2 Ve 
and PO2 —> Ve (first and second row, respectively), before ("Control") and after ("Dom- 
peridone") the administration of domperidone. The first, second, and third columns 
correspond to the fixed-bin, KDE, and D-V partitioning methods, respectively. Each 
closed circle represents one lamb, and the pre- and post-domperidone transfer entro- 
pies from the same lamb are connected with a straight line. Hence, we expect to see as 
many lines with a positive slope as possible. The number of subjects (N) varied among 
the different estimation methods as well as between pre- and post- domperidone since 
some cases did not pass the surrogate significance test. The group statistics are shown 
by the box plots. 

Figure 4 shows that all three techniques were able to detect a statistically significant 
increase in transfer entropy for PO2 — > Ve (interpreted as increased chemosensitivity) 
post-domperidone. However, the D-V partitioning technique indicated the statistically 
strongest increase in transfer entropy following domperidone administration (p < 0.01). 
Moreover, only D-V partitioning detected a significant increase in the PCO2 — > Ve 
chemosensitivity post-domperidone (p < 0.05). 

There are three other notable findings in Figure 4. First, fixed-binning led to higher 
transfer entropy values for both control and domperidone cases than the other two 
techniques. Second, unlike the simulation results, KDE did not show smaller estimation 
variance than the other two methods. Third, domperidone administration increased the 
range in transfer entropy in most panels in Figure 4. 
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Control Domperidone Control Domperidone Control Domperidone 



Figure 4 Lamb experiment: comparison of the three transfer entropy estimation techniques. Results 
from applying the fixed-bin, kernel density estimation, and adaptive partitioning techniques are presented 
in the first, second, and third columns, respectively. Within each panel, transfer of entropy during baseline 
(Control) and after drug administration (Domperidone) are compared. Each closed circle represents one 
animal, and the box-plots capture the group statistics (median and upper and lower quartiles). The control 
and post-domperidone transfer entropies from the same lamb are connected with a straight line; we 
expected to see as many lines with a positive slope as possible. The numbers above each box-plot indicate 
the total number of animals who passed the surrogate test of significance, t and tt indicate p < 0.05 and 
p < 0.01, respectively. Note that all three techniques indicate a statistically significant increase in P0 2 
chemosensitivity following domperidone administration; fixed-bin method in panel B: 0.55 bits [0.48 0.65] 
([interquartile-range]) to 0.74 bits [0.58 0.83] (p < 0.05), kernel density estimation in panel D: 0.25 bits [0.20 
0.35] to 0.40 bits [0.21 0.54] (p < 0.05), and adaptive partitioning in panel F: 0.37 bits [0.30 0.41] to 0.54 bits 
[0.40 0.73] (p < 0.01). Note that the partitioning technique indicated the largest increase in transfer entropy 
post-domperidone. Furthermore, only the partitioning technique revealed a significant increase in PC0 2 
chemosensitivity post-domperidone; panel E: 0.29 bits [0.24 0.32] to 0.36 bits [0.25 0.49] (p < 0.05). 



Discussion 

In this paper, D-V partitioning was extended to three dimensions and, along with 
ranking, was used in the framework of transfer entropy estimation. Also, a comparison 
among three transfer entropy estimation techniques (namely traditional fixed-binning 
with ranking, state-of-the-art KDE, and adaptive partitioning) was conducted via a 
simulated experiment and an application to respiratory physiology. The focus of both 
studies was to investigate the methods' ability to detect increases in directional cou- 
pling strength under small sample size and outliers which are commonly encountered 
in biomedical studies. Although KDE detected increases in coupling strength at the 
lowest SNR level across different sample sizes in the simulation, D-V partitioning out- 
performed the other two methods in detecting increased chemosensitivity caused by 
domperidone in the lamb experiment. The respiratory chemoreflex system was a suita- 
ble example of non-linearity and non-stationarity. Several authors have reported evi- 
dence for non-linearity in the respiratory chemoreflex system [22,39]. Non-stationarity 
may also arise as a result of fluctuations in other variables/parameters such as cardiac 
output, respiratory rate, sleep-state, arousal-related changes in respiratory mechanics 
and the chemical control system, behavioral factors, or interventions. The application 
to respiratory physiology in this article demonstrated that the adaptive-partitioning- 
based transfer entropy estimation technique was more sensitive to increases in 
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coupling strength for both the PC0 2 and P0 2 chemosensitivity than the other two 
methods. These results, which are based on non-invasive measurements of sponta- 
neous breathing ventilatory variables, are in agreement with the experimental findings 
of Edwards and colleagues [21] based on the traditional hypercapnic and hypoxic 
responses (i.e., ratio of change in ventilation in response to an incremental increase in 
PCO2 or decrease in PO2 in the inspired air). Therefore, transfer entropy has the 
potential to be a useful surrogate marker of chemosensitivity, and may provide physiol- 
ogists and physicians with a non-invasive tool to track patient response to pharmacolo- 
gical interventions (e.g., Acetazolamide). 

Strictly speaking, transfer entropy measures the coupling strength of a causal link 
between two time series at a specific time lag and thus makes it difficult to attribute 
any observed change in transfer entropy to changes in chemosensitivity. For example, 
Porta et al. have also shown that cross-conditional entropy can be high with a rela- 
tively low level of baroreflex sensitivity [11]. Thus, any interchangeability in the cur- 
rent study between transfer entropy and chemosensitivity requires further 
investigation, since it is possible to increase transfer entropy by, for example, improv- 
ing the SNR in the data acquisition system under the same level of chemosensitivity 
or changing the operating point of the feedback system in the direction of an 
improved SNR. The simulation corroborated the lamb experiment by enabling a con- 
trolled comparative analysis. For example, unlike the lamb experiment, the simulated 
experiment allowed us to control the amount of noise and forced the signals to be sta- 
tionary throughout. Furthermore, the square function and Laplace noise simulated a 
non-linear coupling relationship and real-life noise, respectively, that are easy to inter- 
pret and control. 

Both fixed-binning and D-V partitioning incorporated ordinal sampling, or ranking, 
which should have at least partially dealt with outliers and multi-modal PDFs. The 
only difference between the two methods is in how the three-dimensional space is par- 
titioned, and here the efficient bin allocation of the D-V partitioning algorithm 
becomes a substantial advantage. Under non-uniform distributions that exhibit multi- 
ple peaks or varying concentrations of data points, fixed-binning is destined to func- 
tion inefficiently by allocating fixed-size bins from the minimum to maximum 
observed values. On the other hand, the D-V partitioning adaptively adjusts bin size 
according to how evenly data are distributed in a given sub-region of the data space. 
The expected effectiveness of adaptive partitioning increases with greater deviation 
from a uniform distribution, and this advantage could partially explain the improved 
performance of the D-V partitioning method in the lamb experiment in comparison 
with the simulation. 

Moreover, there are minimal adjustable parameters in the D-V partitioning algorithm 
(one can choose to change the significance level of the % 2 test), whereas the fixed-bin 
and KDE methods require one to choose the number of bins and the bandwidth of the 
kernel a priori. In fact, KDE allows one more degree of freedom: selection of the kernel 
function, which was not analyzed in this study. Parameter selection can be challenging 
especially with real-life data where the true coupling characteristics (e.g., strength, 
direction, time lag, etc.) are often unknown. With the D-V partitioning technique, one 
can bypass the often arbitrary step of model parameter selection, and this can be a 
substantial advantage in practice. 
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The block lengths k and / as well as the time lag t were fixed at 1 in the current 
study similar to previous studies quoting computational reasons, despite the known 
limitation that transfer entropy is expected to be underestimated [1,38]. In cases where 
data size is sufficiently large and computational tractability is not a concern, one can 
attempt to optimize k, I, and t. For instance, Faes et al. discussed a greedy approach 
for conditional entropy that selects an optimal conditional pattern consisting of past 
data from two or more related signals [12]. Porta et al. studied the effects of condi- 
tional pattern length on cross-conditional entropy [40]. With respect to computational 
time, the three different methods are ranked in the following ascending order: fixed- 
bin, D-V partitioning, and KDE. Although KDE performed very well in the simulated 
experiment, the necessity to compute the distance between a given point and every 
point in the data set along each dimension makes it computationally intensive. How- 
ever, it is worthwhile noting that computational time is not a major limiting factor in 
most research studies that only require retrospective analysis. 

In a number of biomedical studies, small sample size and presence of outliers are 
often closely associated with physiological dynamics. A common focus of many 
research studies is to investigate how physiological systems behave under different con- 
ditions or respond to external interventions, naturally formulating a non-stationary 
problem. One way to analyze a non-stationary time series is to segment them into 
quasi-stationary intervals. This limits the number of data points in each segment and 
leads to the small sample size problem. This can be a major challenge to many para- 
meter estimation algorithms in presence of noise, because they often require suffi- 
ciently long observations. In addition, physiological systems tend to exhibit non-linear 
behaviors which may cause the state of the system to abruptly change from its steady- 
state baseline, producing outliers. These outliers, originating from underlying physiolo- 
gical phenomena rather than noise, convey meaningful value and hence should not be 
removed for mere computational convenience. We remind the reader that our simu- 
lated experiment explicitly investigated the issues of small sample size and of outliers. 

Conclusions 

In this article, we have shown that transfer entropy can detect changes in directional 
coupling between two biomedical time series. We have extended D-V partitioning to 
transfer entropy estimation and compared the performance of three transfer entropy 
estimation methods in detection of increased coupling strength: fixed-binning with rank- 
ing, KDE, and D-V partitioning with ranking. Based on the results of this study, fixed- 
binning, even with ranking, failed to clearly outperform the other two methods, while 
the comparison between KDE and D-V partitioning was inconclusive. However, D-V 
partitioning may be the most attractive option after taking into account computational 
time and the difficulty associated with parameter selection. We hope that this study pro- 
vides a helpful guideline in selecting an appropriate transfer entropy estimation method. 

Appendix I 

Given two concurrently sampled time series X = {x lt x 2 , ■ • •, x N } and Y = \y lt y 2 , ■ ■ ., 
y N }, the conditional entropy H{xi \ yj) is defined as: 
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Note that H(xi \ y^) > 0. Then, the transfer entropy from X to Y, termed T x ^y, can 
be derived from conditional entropies as follows: 

Tx^v = H{ Yl \yf_ t ) - H{ n \y® t , xf\) 

- Y to y m ) log -i^sL _ y piyi S #) ) log (13) 

But since p(y,, y|_ £ ) = X!/'' Pfyii Yi-t> x i-r)> t ne two arguments in the separate sum- 
mations can be combined. Thus, 

j,. jy ffl (X w PW' Yi-tiPWi-t' x i-z 1 



pi Co 



Appendix II 

Figure 5 shows a two-dimensional visualization of D-V partitioning. The space con- 
tains 1,000 data points sampled from a bivariate Gaussian distribution with a xy = 0.9, 

ffy = 1, and cr 2 = 1. This figure shows the transformed data space after ordinal 
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Figure 5 A two-dimensional visualization of D-V partitioning 
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sampling. The proposed transfer entopy estimation with the D-V partitioning algo- 
rithm extends this kind of partitioning to three dimensions. 
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